Recursive neural networks (RNN) and their recently proposed extensionrecursive long short term memory networks (RLSTM) are models that computerepresentations for sentences, by recursively combining word embeddingsaccording to an externally provided parse tree. Both models thus, unlikerecurrent networks, explicitly make use of the hierarchical structure of asentence. In this paper, we demonstrate that RNNs nevertheless suffer from thevanishing gradient and long distance dependency problem, and that RLSTMsgreatly improve over RNN's on these problems. We present an artificial learningtask that allows us to quantify the severity of these problems for both models.We further show that a ratio of gradients (at the root node and a focal leafnode) is highly indicative of the success of backpropagation at optimizing therelevant weights low in the tree. This paper thus provides an explanation forexisting, superior results of RLSTMs on tasks such as sentiment analysis, andsuggests that the benefits of including hierarchical structure and of includingLSTM-style gating are complementary.
展开▼